Chinese Base-Phrases Chunking

نویسندگان

  • Yuqi Zhang
  • Qiang Zhou
چکیده

This paper introduces new definitions of Chinese base phrases and presents a hybrid model to combine Memory-Based Learning method and disambiguation proposal based on lexical information and grammar rules populated from a large corpus for 9 types of Chinese base phrases chunking. Our experiment achieves an accuracy (F-measure) of 93.4%. The significance of the research lies in the fact that it provides a solid foundation for the Chinese parser.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Approach to Chinese Base Noun Phrase Chunking

In this paper, we propose a hybrid approach to chunking Chinese base noun phrases (base NPs), which combines SVM (Support Vector Machine) model and CRF (Conditional Random Field) model. In order to compare the result respectively from two chunkers, we use the discriminative post-processing method, whose measure criterion is the conditional probability generated from the CRF chunker. With respec...

متن کامل

A Conditional Random Field-based Traditional Chinese Base Phrase Parser for SIGHAN Bake-off 2012 Evaluation

This paper describes our system for the subtask 1 of traditional Chinese Parsing of SIGHAN Bake-off 2012 evaluation. Since this research mainly focuses on speech recognition and synthesis applications, only base phrase chunking was implemented using three Conditional Random Field (CRF) modules, including word segmentation, POS tagging and base phrase chunking sub-systems. The official evaluatio...

متن کامل

Noun Phrase Chunking in Hebrew: Influence of Lexical and Morphological Features

We present a method for Noun Phrase chunking in Hebrew. We show that the traditional definition of base-NPs as nonrecursive noun phrases does not apply in Hebrew, and propose an alternative definition of Simple NPs. We review syntactic properties of Hebrew related to noun phrases, which indicate that the task of Hebrew SimpleNP chunking is harder than base-NP chunking in English. As a confirmat...

متن کامل

تعیین مرز و نوع عبارات نحوی در متون فارسی

Text tokenization is the process of tokenizing text to meaningful tokens such as words, phrases, sentences, etc. Tokenization of syntactical phrases named as chunking is an important preprocessing needed in many applications such as machine translation information retrieval, text to speech, etc. In this paper chunking of Farsi texts is done using statistical and learning methods and the grammat...

متن کامل

Chunking with Max-Margin Markov Networks

In this paper, we apply Max-Margin Markov Networks (M3Ns) to English base phrases chunking, which is a large margin approach combining both the advantages of graphical models(such as Conditional Random Fields, CRFs) and kernel-based approaches (such as Support Vector Machines, SVMs) to solve the problems of multi-label multi-class supervised classification. To show the efficiency of M3Ns, we co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002